AI029

Reinforcement Learning: An Introduction

Multi-arm Bandits

Lecture

Lesson 2

Date

2026-04-21

Teacher

AI Tutor

Duration

60 Mins

Learning Objectives

Define the k-armed bandit problem framework
Evaluate the exploration-exploitation trade-off
Implement epsilon-greedy and Upper Confidence Bound (UCB) action selection
Analyze incremental update rules for action-value estimation
Compare performance of various bandit algorithms in stationary and non-stationary environments